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1. Motivations and preliminaries 

In probability theory, the exact calculation of the distribution function of the sum of 
d dependent random variables Xi , . . . , Xd is a rather onerous task. Even assuming the 
knowledge of the joint distribution H of the vector {Xi , Xd) , one often has to rely on 
tools like Monte Carlo and quasi-Montc Carlo methods. All of these techniques warrant 
considerable expertise and, more importantly, need to be tailored to the specific problem 
under consideration. In this paper, we introduce a numerical procedure, called the AEP 
algorithm (from the names of the authors), which accurately calculates 

¥[Xi + ---+Xd<s] (1.1) 

at a fixed real threshold s and only uses the joint distribution H without the need for 
any specific adaptation. 

Problems like the computation of (1.1) arise especially in insurance or finance when 
one has to calculate an overall capital charge in order to offset the risk position Sd = Xl-\- 
■ ■ ■ + Xd deriving from a portfolio of d random losses with known joint distribution H. 
The minimum capital requirement associated to Sd is typically calculated as the value-at- 
risk (i.e., quantile) for the distribution of Sd at some high level of probability. Therefore, 
the calculation of a VaR-based capital requirement is equivalent to the computation of 
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the distribution of Sd (see (l-l))- For an internationally active bank, this latter task 
is required, for example, under the terms of the New Basel Capital Accord (Basel II); 
see [4]. 

An area of application in quantitative risk management where our algorithm may be 
particularly useful is stress-testing. In this context, one often has information on the 
marginal distributions of the underlying risks, but wants to stress-test the interdepen- 
dence between these risks; a concept that enters here is that of the copula. Especially 
in the context of the current (credit) crisis, flexibility of the copula used when linking 
marginal distributions to a joint distribution has no doubt gained importance; see, for 
instance, [8]. 

Although the examples treated in this paper are mainly illustrative, the dimension 
d (<5), the marginal assumptions and the dependence structure (Clayton and Gumbel 
copula) used are typical for risk management applications in insurance and finance. For 
more information on this type of question, see, for instance, [1, 5, 20]. 

In the following, we will denote (row) vectors in boldface, for example, 1 = (1, . . . , 1) S 
W', d> 1. Gfc represents the kth vector of the canonical basis of M.'^ and D — {1, . . . ,d}. 
Given a vector b = {bi,...,bd) G M.'^ and a real number h, Q{h,h) C R'^ denotes the 
hypercube defined as 



X(6fc,6fc + /i], if/i>0, 
X{bk + h,bk], iih<0. 



(1.2) 



For notational purposes, we set Q(b,0) = 0. On some probability space (ri,2l, P), let 
the random variables Xi, . . . , Xd have joint d-variate distribution H; H induces the 
probability measure Vh on M'* via 



Vh 



X {-0O,Xi\ 
i=l 



H{xi,...,Xd). 



We denote by io, . . . , iw all the 2'^ vectors in {0, 1}'*, that is, io = (0, . . . , 0), ifc =ek,k £ D, 
and so on, i^v = 1 = (1, • • • , 1), where iV = 2^* — 1. By #i = X]fc=i ^k, we denote the number 
of I's in the vector i, for example, #io = 0,#iAr = d. The V/f-measure of a hypercube 
Q{h,h), h>0, can also be expressed as 



N 

VH[Q{h, h)] ^ F[Xk e{bk,bk + h],keD] = Y,{-'^ f~*''H{h + hi,). (1.3) 

3=0 

The case h < is analogous. If necessary, (1.3) can also be expressed in terms of the 
survival function H =1 — H . Moreover, iS(b, h) C R'* denotes the d-dimensional simplex 
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defined as 



S{h,h) = < 



( 
( 



X e M'' : Xfc - 6fc < 0, fc e D and y^(xfc 



x eW^ : Xk - bk > 0, k e D and ^(xfe 



k=l 
d 



d 



bk) > h 



bk) < h 



] 
1 



if h > 0, 



if h < 0. 



(1.4) 



Again, S{h, 0) = 0. Finally, we denote by the Lebesgue measure on R''. For instance, 
the Lebesgue measure of the simplex 5(b, h) is given by 



2. Description of the AEP algorithm for d = 2 

Throughout the paper, we assume the random variables Xi, . . . ^Xd to be non- negative, 
that is, P[Xk < 0] = 0, fc e -D. The extension to random variables bounded from below 
is straightforward and will be illustrated below. We assume that we know the joint 

distribution H of the vector {Xi, . . . , Xd) and define Sd = Xi H 1- Xd- Our aim is then 

to numerically calculate 



for a fixed positive threshold s. 

Due to (1.3), it is very easy to compute the Vff-measure of hypercubes in W^. The idea 
behind the AEP algorithm is then to approximate the simplex 5(0, s) by hypercubes. 
Before proceeding to the general case, we first illustrate our method for dimension d = 2. 

As illustrated in Figure 1, the Vff -measure of the simplex Si =S{0, s) can be proxied by 
the Vf/ -measure of the hypercube Q\ = Q{0,as) with a £ [1/2, 1). The error committed 



Xd[S{h,h)]^LL, 



(1.5) 



nSd<s] = V„[S{0,s)] 



as 




Figure 1. Decomposition (2.1) of the two-dimensional simplex 5(0, s). 
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by using this approximation can be expressed in terms of the measure of the three 
simplexes 

5] =5((0,as),(l -a)s), ^ S{{as,0), {1 - a)s) and 

cSf =S{{as,as),{l~2a)s). 

Formally, we have 

<S(0,S) = (Q}U5]U5|)\5| for aU a e [1/2,1). (2.1) 

Since a e [1/2, 1), the sets iSjjiSi and are pairwise disjoint. Also, note that 5| C Q}. 
The Vff-measure of 5(0, s) can thus be written as 

Vh [5(0, s)] = Vh[QI] + Vh [S^] + Vh [Si] - Vh [SI] . 

With the notation S2 — — I and S2 = — 1 , we translate the equation above into 

3 

Vh[S{0, s)] ^ Vh[QI] + J2 4Vh[S^]. (2.2) 

Using (1.3); a first approximation of Vff [5(0,s)] is given by the value 

Piis) = Vh[QI] = H{as, as) - H{0, as) - H{as, 0) + F(0, 0). 

Using (2.2), the error committed by considering Pi{s) instead of Vh[S{Q,s)] can be 
expressed in terms of the Vy-measure of the three simplexes S2 defined above, that is, 

3 

VH[SlS),s)]-P^{s)=Y,slVH[Si^]. (2.3) 
fc=i 

At this point, we can apply to each of the 51 's a decomposition analogous to the one given 
in (2.2) for Sl = 5(0, s), in order to obtain a better approximation of their measures and 
hence of the measure of S\. The only difference between the first and the following step 
is that we have to keep track of whether the measure of a simplex has to be added to or 
subtracted from the next approximation, P2{s), of Vh[S{0,s)]. The value S2, associated 
to each simplex 82, indicates whether the corresponding measure is to be added {s\ = 1) 
or subtracted (sj = —1). The next approximation, ^2(5), will be defined such that the 
difference Vh[S{Q, s)] — ^2(5) is the sum of the V/^-measures of a total of nine simplexes 
produced by the decompositions of the three 52 's. The nine simplexes are then passed 
as input to the third iteration and so on. 

Before formally defining the algorithm in arbitrary dimension d, it is important to 
make the following points. 

• We will prove that the set decomposition (2.1) holds analogously in arbitrary dimen- 
sion d for every choice of a G [l/d, 1). Unfortunately, the simplexes S'^j^^ generated 
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at the nth iteration of the algorithm are, in general, not disjoint for d > 2. This will 
imply a more complicated formula for the general Vf^-measure decomposition. 
• Equation (2.2) depends on the choice of a. In Section 4, we will study an optimal 
choice for a. 



3. Description of the AEP algorithm for arbitrary d 

Recall that in Section 1, we denoted by ig, . . . , Iat all of the 2'^ vectors in {0, 1}'*, where 
= 2'* - 1. Also, let a € [l/d,l). At the b eginning of the nth iteration (n G N), the 

algorithm receives as input N"^^ simplexes which we denote by =5(b^,/iJ^), for 

fc = 1, . . . ,iV"^^. To each simplex, we associate the value sj^ £ {— 17 1}, which indicates 

whether the measure of the simplex has to be added (sf^ = 1) or subtracted (sj^ = —1) in 

order to compute an approximation of Vh[S{0, s)]. 

Each simplex 5^' is then decomposed via one hypercubc ~ Q(b^j,a/i^) and N 

simplexes S^^i — S{h'|^_^_l,h^J^l). In Appendix 8, we prove the rather technical result 

that the -measure of each simplex can be calculated as 

N 

Vh[S^] = VnlQi] + m^VH[S^^^''+'], (3.1) 

where the sequences b^, /i^ and are defined by their initial values b} = 0, ft,} = s and 
b^^T"'+-'' = b,'; + ah'^i,, C+r"'+-'' = (1 - 

(3.2) 

if #ij < 1/a, 
if #i, = 1/a, 
if #ij > 1/a 

for all j = 1, . . . , and k — 1, . . ., A^""-'^ . At this point, we note that by changing the value 
b}, one can apply the algorithm to the case in which the random vector (Xi, . . . ,Xd) 
also assumes negative values, but is still bounded from below by h\. 

We define the sequence P„ (s) as the sum of the V/^-measures of the Qj; , multiplied by 
the corresponding s^, 

P„(s) = P„_i(s)+ J2 ^nVHm = J2 E (3-3) 

fc=l 1=1 k=l 

where Po{s) ~ and the sj^ arc defined by = 1 and 

gNk-N+j ^ ^k^j ah j 1, . . . , iV and fc = 1, . . . , N"-\ (3.4) 

We will show that, under weak assumptions on iJ, the sequence P„(s) converges to 
Vh[S{0, s)]. Moreover, from (1.3), P„(s) can be calculated in a straightforward way. The 
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(A^" ^) X N ^ iV" simplexes S^^^ generated by (3.1) are then passed to the {n + l)th 
iteration in order to approximate their Vff -measures with the measures of the hypercubes 

Qn+l- 

As a first step to show that P„(s) tends to Vh[S{0, s)], we calculate the error by using 
P„(s) instead of Vh[S{0,s)]. 

Theorem 3.1. With the notation introduced above, we have that 

VH[S{0,s)]-K{s) = Y.'n+iVH[S^+^] forallneN. (3.5) 

fe=i 



Proof. We prove the theorem by induction on n. Note that for n = 1, (3.5) corresponds 
to (3.1). Now, assume by induction that 



N+j] 



k=l 

which, recalling (3.1), (3.3) and (3.4), yields 

JV"-1 N"-^ / N 

k=l k=l \j = l / 

AT"-! AT 

k=l 3 = 1 

Ar"-i AT AT" 

=Pn(s)+ E E^"+^"''^^■^«['5,f+Y''^^]-^n(.)+E^'^^ 

i=i k=i n 



We are now ready to give a sufficient condition for the convergence of the sequence 
Pn{s) to Vff[5(0,s)]. The idea of the proof is that if the total Lebesgue measure of the 
new N simplexes S^j^i , j — 1, . . . , N, generated by the simplex Sl^, is smaller than 
the Lebesgue measure of itself, then, by assuming continuity of H, the error (3.5) will 
go to zero. Let us define e„ = X]fe=i ^d[<Sn+i] to be the sum of the Lebesgue measure of 
the simplexes passed to iteration n + 1. We define the volume factor f{a) to be the ratio 
between the sum of the Lebesgue measure of the simplexes in two subsequent iterations, 
that is, /(a) = e„/e„_i. Recalling the formula (1.5) for the A^-measure of a simplex, we 
have that 

^ ,^Nk-N+j,_^ lil-j^i.aKl" _^fd\ |l-ja|1fe^|'^ 

l^MSn+l J d\ • 
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Observing that the A'' simplexes S^^^ , j = 1, . . . , A^, (|^) are generated by the simplex 
5,^, we use the above equation to conclude that 



/(«) 



(i/^!)Ef:rife:;l'^E,ti(,')|i-j^l 



A sufficient condition for the convergence of the AEP algorithm can then be expressed 
in terms of the volume factor f{a). We first assume H to be absolutely continuous with 
a bounded density. 

Theorem 3.2. Assume that Vh has a bounded density vh- If the volume factor satisfies 
f{a) < 1, then 



lim P,-,{s) = Vh[S{0,s)]. 



(3.6) 



Proof. Since Vh has a density vh bounded by a constant c > 0, using (3.5), we have 
that 



\VH[SiO,s)]-Pn{s)\ 



k=l 
k=l 

= c 



Sn+lCdXd I \Sn+l\dX 

E/, d\d = cY,M'Sn+i]^cen. 

I — 1 SS, I 1 /, — 1 



k=l 



We conclude by noting that since > and e„/e„_i = f{a) < 1 by assumption, e„ goes 
to zero exponentially in n. □ 

In order for (3.6) to hold, it is sufficient that vh is bounded on U^i'^n+i fo^' " large 
enough. Define the curve Tg as 



r, = |(a;i,...,Xd)GM'^:^^Xfc=s|. 



(3.7) 
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The following theorem states that the L^-distance from the curve Tg to each point in 
Ufe=i '^n+i is bounded by a factor 7^5, where 7 = max{l — a, |1 — dal}. When a G (0, 2/d), 
we have 7 < 1 and that this distance goes to zero as n ^ 00. For Theorem 3.2 to hold 
when a G {0,2/d), it is then sufficient to require that H has a bounded density only in a 
neighborhood of Tg- We will discuss this assumption further in Section 8. 

Theorem 3.3. If ^ & [J^^i S^^i , then its -distance from the curve Tg is bounded by 
7"s, with 7 ~ max{l — a, |1 — da\}. 

Proof. Wc denote by fefj'*" (resp., ip for r £ D the d components of the vectors (resp., 
ij). Wc prove by induction on n that 

d 

J^^n^' + ^n^s for all fc = l,...,Af"-i and n> 1. (3.8) 

r=l 

For n = l, the statement is true since there is only one simplex with b} = and h\= s. 
Now, assume the statement holds for n> 1. By (3.2), wc have that, for all j = 1, . . . , 
and fc = l,...,A^"-i, 

d 

E,Nk-N+3,r ,Nk-N+j 
"ri+1 ' "-n+l 

r=l 

d d d 

= 5](6^-'- + a/i^p + (1 - #i,a)h^ = b'^^'- + ah'^ + < - ^^n^, 

r— 1 r— 1 r— 1 

d d 
= + «/ln#ij ah^^#i, = ^ b';f + = S, 



where the last equality is the induction assumption. Due to (3.8), every simplex 5^+^ 
generated by the AEP algorithm has its diagonal face lying on the curve F^. As a con- 
sequence, the L^-distance from F^ of each point in S'^j^i is strictly smaller than the 
distance of the vector h^j^^, which is For a fixed n and fc = 1, . . .N'^~^, we have 

that |C+7^+^'| < 7|/i^| for all j^l,...,N. Hence, 

^^max^J/i^;+i|=7"/»}=7"s, (3.9) 

where, for every n > 1, equality holds since we have ^^-^j = 7|/iJ"j| for j = 1 or 

j = N. □ 



4. Choice of cx 



As already remarked, the AEP algorithm depends on the choice of the parameter a. It is 
important to note that, in general, an optimal choice of a would depend on the measure 



570 



P. Arhenz, P. Embrechts and G. Puccetti 



Vh- In the proof of Theorem 3.2, we have shown that 

\Pn{s)-VH[S{0,s)]\<Cf{ar, 

where C is a positive constant. Since we want to keep our algorithm independent of the 
choice of the distribution H, we suggest using the a* which minimizes f{o:), that is, 

a* = argmin /(a) = - — -. 
ae[i/d,i) d+1 

For dimensions d <7, some values of a* and the corresponding optimal volume factors 
f{a*) are given in Table 1. 

We will show that using a* has several desirable consequences. First, when a ~ a* 
and the dimension d is odd, in the measure decomposition (3.1), a number of ( i)/2 ) 

simplexes have the corresponding coefficient m-' equal to zero and can therefore be ne- 
glected, increasing the computational efficiency of the algorithm. For example, in the 
decomposition of a three-dimensional simplex, the algorithm generates only 4 new sim- 
plexes at every iteration with a = a* , instead of the 2^^ — 1 = 7 generated with any other 
feasible value of a. Hence, for a = a* , the number of new simplexes generated at each 
step is given by the function 

— 1, if d is even, 

sec Section 5 for further details on this. 
Since we have that (proof of Theorem 3.3) 



(o,+cx))'^n y s^^ c5(o. 



(l-f7")5)\'5(0,(l-7'», (4.2) 



the choice of a = a* will be convenient. Note that, when a = a* G {0,2/ d), we have 
that 7 < 1 and 7"s goes to zero as n oo. In order to guarantee the convergence of 
the sequence P„, it is then sufficient to require that the distribution H has a bounded 
density only in a neighborhood of F^. Moreover, it is straightforward to see that a* also 
minimizes 7. 



Table 1. Values for a* and f{a*) for dimensions d<7 



d a* f{a*) d a' f{a') 

9 1 I i M 

^33 3 27 

3 i i 6 I >1 

4 2 J3 7 1 >i 
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Figure 2. The decomposition of a simplex by tlie AEP algoritlim in the case d = 2. 

As illustrated in Table 1, Theorem 3.2 states the convergence of the sequence Pn{s) 
when d <b. Various elements affect the speed at which P„(s) converges. First, in order 
to seriously affect the convergence rate of P„(s), it is, in general, always possible to put 
probability mass in a smooth way in a neighborhood of the curve Tg . For the distributions 
of financial and actuarial interest used in Section 6, the algorithm performs very well; slow 
convergence is typically restricted to more pathological cases, such as those illustrated 
in Section 8. We also have to consider that, for the same distribution i7, it is, in general, 
required to compute the distribution of Sd at different thresholds s. Problems such as 
those described in Section 8 may then occur only at a few points s. 

A more relevant issue is the fact that the memory required by the algorithm to run 
the nth iteration increases exponentially in n. At each iteration of the algorithm, every 
simplex iS^ produces one hypercube and a number fs{d) of new simplexes to be passed 
to the following iteration; see (4.1). The computational effort in the {n — l)th step thus 
increases as 0(/s((i)"). While the dimensions d<5 are manageable, as reported in Sec- 
tion 6, the numerical complexity for d >6 increases considerably and quickly exhausts 
the memory of a standard computer. 

Finally, choosing a = a* also allows the accuracy of the AEP algorithm to be increased 
and, under slightly stronger assumptions on H, will lead to convergence of AEP in higher 
dimensions; see Section 5. 

We now give some examples of the first step {n = 1) of the measure decomposition (3.1) 
obtained by choosing b = 0, s ~ 1, a ~ a* , for d~2,3: 

• in the case d = 2, with a = 2/3, we obtain (see Figure 2) 

^^[^((0, 0), 1)] = Vh[Q{{0, 0), 2/3)] + VniSm 2/3), 1/3)] 

+ Fh[5((2/3,0), 1/3)] - Fh[5((2/3, 2/3), -1/3)]; 

• in the case d = 3, with a = 1/2, wc obtain (see Figure 3) 

VhISHO, 0, 0), 1)] = VniQm 0, 0), 1/2)] + Vh[S{{1/2, 0, 0), 1/2)] 

+ Vh[S{{0, 1/2, 0), 1/2)] + VniSm 0, 1/2), 1/2)] 
-V}f[5((l/2,l/2,l/2),-l/2)]. 
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Figure 3. The decomposition of a simplex by tlie AEP algoritlim in the case d = 3. 



5. An improvement of the numerical accuracy of the 
algorithm via extrapolation 

In this section, we introduce a method to increase the accuracy of the AEP algorithm. 
This method is based on the choice a = a* , as discussed in Section 4. To this end, we 
will make the stronger assumption that the joint distribution H has a twice continuously 
differentiable density vh, with bounded derivatives. This will allow us to approximate the 
density vh by its linear Taylor expansion, providing a good estimate of the approximation 
error of AEP after a number of iterations. 

We first need two simple integration results. Denoting by Sd~i a simplex in dimension 
(d — 1), for all s > 0, we have 



Xddx^ I I ... I I Xddx 

5(0, s) 



rs-Xd 






Jo ^ 


'o 


Jo 








Xd / 


-r 




Jo 


Jo 


Jo 


Xd^d-l[Sd 


_i(0,s 


- Xd)] dxd / 






Jo 



dx 



■dxd = 



Analogously, for all s > 0, we have 

p /•as /•as /•as 

/ Xddx= / ... / Xddx 

JQ{0,as) Jo Jo Jo 



as pas pas pas 

Xd ... dx = (as)'^-M Xddxd^l/2{as)''+\ 
Jo Jo Jo 
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We now compute the VH-measures of a hypercube and a simplex in the basic case in 
which the distribution H has a linear density, that is, i'_f/(b + x) = a + X]fc=i Cfc^^fe for 
xG 5(0,s) U Q(0,as). For aU s > 0, we obtain 

Vff[5(b,s)] = a / dx + V'cfc / x^dx 

J5(0,s) ^.^^ J5(0,s) 



(5.1) 



Ck 

k=l / \ - ■ - ^^j^ ^ 



Vf/(Q(b,as)) = a / dx + ^Cfc / Xfc dx 

"'Q(0,as) "'Q(0,a,s) 



(5.2) 



Thus, for a linear density vh, the ratio V// [5(b,s)]/V/f [Q(b,Q!s)] can be made inde- 

2 

d+l 



pendent of the parameters h,s,a and of the c^'s, by choosing a — a* ~ ^i^rfi for which 



we have 

VH[Sih,s)] = i^±^VH[Q{h,a*s)]. (5.3) 

With similar computations, we obtain the same result for s < 0. The following theorem 
shows that (5.3) analogously holds for any sufficiently smooth density, in the limit as the 
number n of iterations of the AEP algorithm goes to infinity. 

Theorem 5.1. Assume that H has a twice continuously dijferentiable density vh with 
all partial derivatives of first and second-order bounded by some constant D. We then 
have that 



1 

sup max ,, , , , , „ 



VH[S{hlhl)] - ^^d^Vnmhla^h^^)] 



<A<oo (5.4) 



Jor some positive constant A depending only on the dimension d and the distribution H . 

Proof. For a given bj"j, we can use a Taylor expansion to find some coefficients a and 
Ck, k = 1, . . . ,d, depending on b^, such that 

d 

t;H(b^+x)=a + ^CfeX-fc+ ^ i?0(x)x'5 for aU x G i3(b^), (5.5) 

fe=l 1^1=2 

where i3(b^) is a ball in R'^ centered at b^; such that S(b^) D 5(b^,/i^) U Q(h^,a*h^). 
Note that in equation (5.5), we used multi- index notation to indicate that the sum in 
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the last equation extends over multi- indices /3 S N''. Using the assumption on the partial 
derivatives of vh, the remainder term Rplpc) satisfies the inequality 



|-R/3(x)| < sup 

xeB(bJ) 



1 d^vni^ 



<D 



(5.6) 



for all (3 with \/3\ = 2. Using (5.5) and recalling the expressions (5.1) and (5.2) for a linear 
density and a positive /i^j , we obtain 



VH[Sihth'^)]-^-^±^VH\Q{htah'^)] 



k\d 



JL- \ " 

+ 1 ^ 



Cfe 



k=l 



^ i?/3(x)x'''dx 
"SCO,''!;) \p\=2 



[d + lY 
2M\ 



(ahy (a + \ahl ^ ) + /" ^ Rp{ 

\ ^ fc=l / -^SCO.a/^Ji) |^|^2 



c*^ dx 



Choosing a = a* , the previous expression simplifies to 



VH[S{hthi)] - ^A^VH[Q{hla*hi)] 



f y: ^mx)x^ dx - / 5: i?,(x)x^ dx 



l/3|=2 



< 



E 



<D 



i?^(x)x'3dx 
x^ dx 



+ 



2ddl 
(d + lY 



2<^d\ 

{d+l)d- 



E 



i?/3(x)x^ dx 



2rfd! 



E 



x*^ dx 



|/J|=2"'2(0,a*/i;;) 

where the last inequality follows from (5.6). Using the facts that 

^ / x'^dx = ^ / a:,^dx + 2 ^ / XiXjAvi 



l<i<i<d 



2rfs'^+^ 2d{d - 1)5'^+^ 
(d + 2)! ^ (dT2)^ 

2dV+^ 
(d+2)! 
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and 



1/31=2 



(i(as)'^+2 2d{d - l){asY+'^ _ d{3d - l){as) 



d+2 



we finally obtain 



VH[SiK,h'^)] - i^±^VH[Qihla*h^J] 



k I d+2 



(5.7) 



where A is a positive constant depending only on the dimension d and the distribution 
H . Note that in (5.7), we write h^ in absolute value in order to consider the completely 
analogous case in which is negative. Thus, the theorem easily follows from (5.7). □ 

Equation (5.4) gives a local estimator of the mass of the simplex iS(bJj,ft,Jj) in terms 
of the volume of the corresponding hypercubc Q(b^,/iJ^), which is straightforward to 
compute: 



VH[s{hthi)]: 



{d + iy 

2dd\ 



Vh 



d+l 



(5.8) 



In the case where the density vh is sufficiently smooth, it is then possible, after a number 
of iterations of AEP, to estimate the right-hand side of (3.5) by using the approxima- 
tion (5.8). This procedure defines the estimator P*{s) as 



P„*(s)=P„_i(s) 



(rf+1)^ 
2dd\ 



(5.9) 



fe=i 



In what follows, the use of P^(s) as an approximation of Vff[5(0,s)] will be referred 
to as the extrapolation technique. The following theorem shows that P*{s) converges to 
Vff[iS(0,s)] faster, and in higher dimensions, than Pn(s). 

Theorem 5.2. Under the assumptions of Theorem 5.1, we have, for d <8, that 

hrn P:{s)^VH[SiO,s)]. 

Proof. Using (3.5) and (5.7) in the definition (5.9) of P^(s), wc obtain 
E*{n)^\VH[S{0,s)]~P:{s)\ 



AT" 



T/h[5(0,.)]-P„_i(.)-^^^ E 'nVHlQ'j 

k=l 
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k=l ■ fc=l 



(5.10) 



k=l 



k=l 



where, for the positive sequence e* = X^fcLi I'^n+il'^^^i have that 



|d+2 



-ja 



*\d+2 



d 

E 



|<i+2 



|1 - ja* 



\d+2 



The theorem foUows by noting that the factor f^:{d), defined as 

/*(^) = E(-)|i-j"*i'^' 

j=l 



(5.11) 



is less than 1 for d < 8; see Table 2. In these dimensions, e* , and hence E*{n), converge 
to zero. □ 

We should point out that, due to Theorem 3.3, Theorem 5.2 also remains valid in the 
case where H satisfies the extra smoothness conditions on its first and second derivatives 
only in a neighborhood of Tg- Moreover, under the assumptions of Theorem 5.1, it is 
possible to calculate an upper bound for the error E* (n) as a function of the number of 
evaluations performed by AEP. Indeed, (5.10) can be rewritten as 



E*{n)<AfM'"- 



(5.12) 



Table 2. Extrapolation error ratio /.(d) as defined in (5.11), number fs{d) of new simplexes 
produced at each iteration and convergence rates of the AEP extrapolation error as a function 
of the number of evaluations performed by the algorithm; for d = 9, convergence of AEP is not 
assured (na) 



d 


2 


3 


4 


5 


6 


7 


8 


9 


Md) 
fs{d) 

In /.(d) 
ln/s{<i) 


0.0370 
3 

-3 


0.1250 

4 

-1.5 


0.2339 
15 

-0.54 


0.3580 
21 

-0.34 


0.4982 
63 

-0.17 


0.6556 
92 

-0.09 


0.8314 

255 

-0.033 


>1 
385 
na 
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We now denote by M(n) the total number of evaluations of the joint distribution H 
performed by AEP after the nth iteration. Then, M(n) (as well as the computational time 
used) is proportional to the number of simplexes /s(d)"~^ passed to the nth iteration. 
For all n > 2, we have that 

M(n) = J2 2'/sW' = f - 1) 

fc=0 JS[ ) 

(5.13) 

Here, _B is a positive constant depending only on the dimension d. Combining (5.12) 
and (5.13) gives 

Then, (5.14) provides an upper bound on the AEP approximation error E*(n) as a 
function of the number of evaluations performed. The polynomial rate of convergence 
in/s(d) ^^^^ bound depends only on the dimensionality d. In Table 2, we calculate 
this bound for dimensions d < 8. These numbers can be useful in order to compare 
the efhciency of AEP with that of other algorithms, such as Monte Carlo methods (see 
Section 7 and Table 11). 



6. Applications 

In this section, we test the AEP algorithm on some risk vectors {Xi, . . ., Xd) of financial 
and actuarial interest. For illustrative reasons, we will provide the joint distribution 
function H in terms of the marginal distributions Fx^ and a copula C . For the theory of 
copulas, we refer the reader to [17]. 

In Table 3, we consider a two-dimensional portfolio [d = 2) with Parcto marginals, 
that is, 

Fx. {x) = P[X, < x] = 1 - (1 + x)-^\ a; > 0, i = 1, 2, 

with tail parameters 9i = 0.9 and 62 = 1.8. Wc couple these Pareto marginals via a 
Clayton copula C = C^' with 

Cf (mi, . . . , Ud) = (ur* + u^^ + --- + u-^ -d + 1)-!/^ Ufc e [0, 1], fc = 1, . . . , d. 

The parameter 5 is set to 1.2. For the portfolio described above, we compute the ap- 
proximation P„(s) (see (3.3)) at some given thresholds s and for different numbers of 
iterations n of the algorithm. The thresholds s are chosen in order to have estimates in 
the center as well as in the (heavy) tail of the distribution. For each n, wc provide the 
computational time needed to obtain the estimate on an Apple MacBook (2.4 GHz Intel 
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1 min - 



1 sec - 



0.01 sec - 




2 4 6 8 10 12 14 16 

n 

Figure 4. AEP computation time (on a log-scale) as a function of the number of iterations n, 
for dimensions 2 < d < 5. 

Core 2 Duo, 2 GB RAM). Of course, computational times may vary depending on the 
hardware used for computations. We also provide the estimates obtained by using the 
estimator Pn{s), as defined in (5.9). 

For all iterations n and thresholds s, in Table 3, we provide the differences P„(s) — 
Pie{s) or P*{s) — Piq{s). This has been done in order to show the speed of convergence 
of the algorithm and the increase in accuracy due to extrapolation. The choice of n = 16 
as the reference value in Table 3 represents the maximum number of iterations allowed 
by the memory (2 GB RAM) of our laptop. However, for a two-dimensional vector, 
we see that all iterations after the seventh leave the first eight decimal digits of the 
probability estimate unaltered for all the thresholds. Thus, the estimate Pris) (0.01 
seconds) could already be considered reasonably accurate. We also note that, on average, 
extrapolation allows the accuracy of the estimates to be increased by two decimal digits 
without increasing computational time. 

In Tables 4 {d — 3) to 6 (c? = 5) we perform the same analysis for different Clayton- 
Pareto models in which we progressively increase the number of random variables used. 
In Tables 4-6, the numbers n = 13 for d — 3, n— 7 for d = 4 and n = 6 for d~ 5 again 
represent the maximum number of iterations allowed by the memory (2 GB RAM) of 
our laptop. 

AEP shows good convergence results for all dimensions d and thresholds s under study. 
In higher dimensions d, the extrapolation technique still seems to provide some relevant 
extra accuracy. Memory constraints made estimates for d > 6 prohibitive. For dimensions 
2 < d < 5, Figure 4 shows that the average computational time needed by AEP to provide 
a single estimate increases exponentially in the number of iterations n. These average 
computational times have been computed based on several portfolios of Pareto marginals 
coupled by a Clayton copula. 

Note that Tables 3-6 provide information about the convergence of the algorithm to a 
certain value, but do not say anything about the correctness of the limit. Indeed, we do 



Table 3. Values for Pn{s) and Pn{s) (starred columns) for the sum of two Pareto distributions with parameters Oi = 0.9 and tq 
^2 = 1.8, coupled by a Clayton copula with parameter 5 = 1.2; for all n < 16, we give the difference from the reference value Pie{s) ^ 







n = 16 

(reference value, 49.25 s) 


n = 7 
(0.01 s) 




71=7* 

(0.01 s) 




71= 10 

(0.06 s) 




71= 10* 
(0.06 s) 




71= 13 

(1.61 s) 




71= 13* 
(1.61 s) 




s = 


10° 


0.315835041363441 


-4.46e- 


■09 


-1.46e- 


-11 


-6.16e- 


12 


-3.70e- 


14 


-3.97e- 


■14 


-2.95e- 


14 


s = 


10^ 


0.983690398913354 


-S.lOe- 


■10 


+ 1.83e 


-09 


-1.85e- 


12 


-5.68e- 


13 


-6.64e- 


13 


-6.96e- 


13 


s = 


10" 


0.999748719229367 


-6.62e- 


■08 


-4.13e- 


-08 


-6.41e- 


12 


+6.38e- 


11 


-1.24e- 


12 


-1.26e- 


12 


s = 


lO*' 


0.999996018908404 


-1.63e- 


09 


-1.22e- 


-09 


-5.40e- 


11 


-3.89e- 


11 


-7.80e- 


13 


-5.07e- 


13 



Table 4. This is the same as Table 3, but for the sum of three Pareto distributions with parameters 6i = 0.9, 6*2 = 1.8 and ^3 = 2.6, 
coupled by a Clayton copula with parameter 5 = 0.4 







71= 13 

(reference value, 118.50 s) 


n = 7 
(0.02 s) 




71 = 7* 

(0.02 s) 




71 = 9 

(0.41 s) 




71 = 9* 

(0.41 s) 




71= 11 

(6.65 s) 




71= 11* 

(6.65 s) 




s = 


10° 


0.190859309689430 


-2.28e- 


-06 


+8.80e- 


-07 


-8.53e- 


-08 


+3.31e- 


-08 


-3.15e- 


■09 


+1.32e^ 


-09 


s = 


10^ 


0.983659549676444 


-1.76e- 


-05 


+ 1.13e- 


-06 


-6.55e- 


-07 


+3.01e- 


-07 


-2.17e- 


■08 


+ l.lle- 


-08 


s = 


10* 


0.999748708770280 


-1.72e- 


-06 


-1.12e 


-06 


-3.86e- 


-07 


-2.39e- 


-07 


-6.43e- 


■08 


-2.95e- 


-08 


s = 


10^ 


0.999996018515584 


-2.78e- 


-08 


-1.83e 


-08 


-6.61e- 


09 


-4.26e 


-09 


-1.35e- 


■09 


-7.66e- 


-10 



Table 5. This is the same as Table 3, but for the sum of four Pareto distributions with parameters 9i = 0.9, 62 = 1.8, 63 = 2.6 and 
^4 = 3.3, coupled by a Clayton copula with parameter 5 — 0.2 







71=7 

(reference value, 107.70 s) 


71 = 4 
(0.03 s) 




71 = 4* 

(0.03 s) 




71 = 5 
(0.47 s) 




71 = 5* 
(0.47 s) 




71 = 6 
(7.15 s) 




71 = 6* 
(7.15 s) 




s = 


10* 


0.833447516734442 


-6.31e- 


-03 


+9.42e- 


-05 


-2.21e- 


-03 


+3.71e- 


-04 


-6.04e- 


-04 


+4.00e- 


-04 


s = 


10^ 


0.983412214152579 


-1.61e- 


-03 


-4.95e- 


-04 


-7.14e- 


-04 


-1.54e- 


-04 


-2.45e- 


-04 


+5.01e- 


-05 


s = 


10^ 


0.997950264030106 


-2.14e- 


-04 


-7.37e- 


-05 


-9.91e- 


-05 


-2.70e- 


-05 


-3.60e- 


-05 


+3.68e- 


-06 


s = 


lO'' 


0.999742266243751 


-2.69e- 


-05 


-9.30e- 


-06 


-1.25e- 


-05 


-3.42e- 


-06 


-4.54e- 


-06 


+4.52e- 


-07 



-J 

CD 
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not have analytical methods to compute Vh[S{0, s)] when the vector {Xi, . . . ,Xd) has a 
general dependence structure (copula) C. 

In practice, it is possible to test the accuracy of AEP in particular cases when the 
Xi are independent or comonotonic. Some test cases are analyzed in Tables 7 {d = 2) 
to 9 (d = 4), where we still assume that we have Pareto marginals, but coupled by a 
Gumbel copula C ~ C^" , in which the parameter 7 > 1 is allowed to vary. Formally, for 
Uk € (0, 1], fc = 1, . . . , d, we have 

. ..,Ud) = exp(-[(- Inmy + (- In ,.2)^ + ■•■ + (- Inua)'']^/''). 

In the tables mentioned above, the multivariate model varies from independence (7=1) 
to comonotonicity (7 = +00). In these two extreme (with respect to the dependence 
parameter 7) cases, we compare the analytical values for Vh[5(0,s)] with their AEP 
estimates. Tables 3-6 show that the extrapolated estimator P*{s) provides accurate 
estimates within a very reasonable computational time. A comparison with alternative 
methods is discussed in Section 7. 

The possibility of computing the value Vh[S{0, s)] independently from AEP also allows 
us to test more specifically the effect of extrapolation. For this, we consider two- and 
three-dimensional vectors of independent Pareto marginals. Figure 5 shows the increase of 
accuracy due to extrapolation. Therefore, under a smooth model for H (see Theorem 5.1), 
the extrapolated estimator P*{s) is to be preferred over P„(s). 

Of course, the AEP algorithm can be used to find estimates for the quantile function, 
that is, for the inverse of the distribution of the sum Sd- Such quantiles are especially 
useful in finance and insurance, where they are generally referred to as value-at-risk 
(VaR) or return periods. In Table 10, we calculate, by numerical inversion, VaR at dif- 
ferent quantile levels a for two different three-dimensional portfolios of risks. In order to 
calculate VaR values, we use root-finding algorithms like the bisection method. 




Figure 5. Error from the AEP algorithm with and without the use of the extrapolation tech- 
nique for two test portfolios: two (left) and three (right) independent Pareto marginals with 
parameters di = i, i = 1, 2, 3. 



Table 6. This is the same as Table 3, but for the sum of five Pareto distributions with parameters 8i — 0.9, 02 — 1.8, 63 = 2.6, 
64 = 3.3 and 65 = 4, coupled by a Clayton copula with parameter S = 0.3 







n — 6 

(reference value, 92.91 s) 


n = 3 
(0.01 s) 




n = 3* 
(0.01 s) 




n = 4 
(0.20 s) 




n = 4* 
(0.20 s) 




n = 5 
(4.37 s) 




n — 5* 
(4.37 s) 




s = 


10^ 


0.824132635126808 


-3.12e- 


-02 


+3.89e- 


-03 


-1.55e- 


-02 


+5.66e- 


-04 


-7.77e- 


-03 


+1.46e- 


-04 


s = 


10" 


0.983253494805448 


-5.30e- 


-03 


-f5.07e- 


-05 


-2.86e- 


-03 


-3.57e- 


-04 


-1.54e- 


-03 


-1.90e- 


-04 


s = 


10^ 


0.997930730055234 


-6.72e- 


04 


-5.23e- 


-06 


-3.66e- 


-04 


-5.29e- 


05 


-1.99e- 


-04 


-2.83e- 


-05 


s = 


10* 


0.999739803851201 


-8.45e- 


05 


-7.22e- 


-07 


-4.61e- 


-05 


-6.67e- 


06 


-2.51e- 


-05 


-3.57e- 


-06 



Table 7. Values for P^{s) for the sum of two Pareto distributions with parameters Oi —i, i = 1,2, coupled by a Gumbel copula 
with parameter 7; the values in the first and last columns are calculated analytically; the computational time for each estimate in 
this table is 0.53 seconds with n = 12 







7 = 1 (exact) 


7 = 1 


7 = 1.25 


7= 1.5 


7 = 1.75 


7 = +cx; 


7 = +00 (exact) 


s = 


10" 


0.2862004 


0.2862004 


0.3280000 


0.3527174 


0.3682522 


0.4108029 


0.4108027 


s = 


10" 


0.9898913 


0.9898913 


0.9895957 


0.9894472 


0.9893640 


0.9891761 


0.9891761 


s = 


10^ 


0.9989990 


0.9989990 


0.9989857 


0.9989798 


0.9989766 


0.9989700 


0.9989700 


s = 


lo-* 


0.9999000 


0.9999000 


0.9998995 


0.9998993 


0.9998992 


0.9998990 


0.9998990 



Table 8. This is the same as Table 7, but for the sum of three Pareto distributions with parameters 9i — i, i = 1, 2, 3, coupled by 
a Gumbel copula with parameter 7; the computational time for each estimate in this table is 6.65 seconds with n = 11 







7 = 1 (exact) 


7 = 1 


7 = 1.25 


7= 1.5 


7 = 1.75 


7 = +CX; 


7 = +00 (exact) 


s = 


10^ 


0.1709337 


0.1709337 


0.2348582 


0.2743918 


0.2994054 


0.3667285 


0.3666755 


s = 


10" 


0.9898380 


0.9898380 


0.9893953 


0.9891754 


0.9890526 


0.9887811 


0.9887760 


s = 


10^ 


0.9989985 


0.9989985 


0.9989812 


0.9989734 


0.9989692 


0.9989604 


0.9989606 


s = 


lO'' 


0.9999000 


0.9999000 


0.9998994 


0.9998992 


0.9998991 


0.9998988 


0.9998988 
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Table 9. This is the same as Table 7, but for the sum of four Pareto distributions with param- 
eters 9i=i, i = 1,2,3,4 coupled by a Gumbel copula with parameter 7; the computational time 
for each estimate in this table is 7.15 seconds with n — 6 







7 = 1 (exact) 


7 = 1 


7= 1.25 


7 = 1.5 


7 = 1.75 


7 = +00 


7 = +QO (exact) 


s = 


10" 


0.1040880 


0.1040713 


0.1762643 


0.2244387 


0.2555301 


0.3387648 


0.3390320 


s — 


10^ 


0.9898032 


0.9896608 


0.9892592 


0.9890502 


0.9889268 


0.9886415 


0.9885287 


s — 


10^ 


0.9989981 


0.9989732 


0.9989652 


0.9989616 


0.9989595 


0.9989743 


0.9989558 


s — 


10* 


0.9999000 


0.9998973 


0.9998973 


0.9998973 


0.9998973 


0.9998973 


0.9998987 



We finally note that the choices of copula families (Clayton, Gumbel) and marginal 
distributions used in this section are purely illustrative and do not in any way affect 
the functioning of the AEP algorithm. The same performances were reached for vectors 
showing negative dependence, as in the case of d Pareto marginals coupled by a Frank 
copula with negative parameter. 

The accuracy of AEP is not sufficient to estimate high level quantiles in dimensions 
(i = 4,5, as done in Table 10 for some three-dimensional portfolios. The algorithm can, 
however, be used to compute a numerical range for the quantiles of the sum of four and 
five random variables. The error resulting from AEP in these higher dimensions turns 
out to be extremely small if compared to the error due to statistical inference. As a 
comparison to statistical methods, we estimate the VaR of the sum of the five Pareto 
marginals described in Table 6 via extreme value theory (EVT) methodology in its "peaks 
over threshold" (POT) form; see [15], Section 7.2. We set the quantile level a = 0.999, a 
value not uncommon in several risk management applications in insurance and finance. 
The POT method is widely used for calculating quantiles in the presence of heavy-tailed 
risks and is known to perform very well in the case of exact Pareto models, such as the 
one studied here. In order to focus on the statistical error produced by the POT method, 
we use, as data, a sample of M realizations from the portfolio described in Table 6. It 

Table 10. Value-at-risk for: (a) a three-dimensional portfolio with marginals Fi =Exp(0.2), 
F2 = Logn(/i = — 0.5,a^ = 9/2), F3 = Pareto(1.2) and a Gumbel copula with 7 = 1.3; (b) a 
three-dimensional portfolio with Pareto marginals with parameters 81 = 0.8, = 1, O3 = 2 and 
a Clayton copula with S — 0.4; the computation of all VaR estimates needs approximately 49 
seconds with 71 = 10 



a 


VaRL"' 


VaRL"' 


0.9 


24.76 


32.87 


0.99 


137.67 


445.36 


0.999 


700.20 


6864.58 


0.9999 


3394.78 


112442.31 


0.99999 


17962.78 


1903698.40 


0.999999 


108190.96 


32889360.00 
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Figure 6. Estimates of VaRo.999 for tiie sum of tlie five Pareto marginals described in Table 6, as 
a function of the threshold used for estimation. Estimates are obtained via POT from M = 5e— 03 
(left) and M — le— 06 (right) simulated data. Along with POT estimates, we give the numerical 
range for the 0.999-quantile obtained via AEP. 



is well known that the statistical reliability of the POT approach is very sensitive to the 
choice of the threshold u beyond which a GPD distribution is fitted. In Figure 6, we 
plot the VaR estimates obtained by choosing different thresholds u. The picture on the 
left is obtained by generating AI = 5000 data, while the one on the right uses M ~ 10^ 
simulations. It is remarkable that, even in an ideal 10^ data world, the statistical range 
of variation of the VaR estimates obtained via POT is broader than the numerical VaR 
range calculated via AEP. Moreover, the POT range of values depends on the specific 
sample used for estimation, while the AEP range is deterministic. In the next section, 
we will compare AEP with more competitive numerical techniques such as Monte Carlo, 
quasi-Monte Carlo and quadrature methods. 



7. A comparison with Monte Carlo, quasi-Monte 
Carlo and quadrature methods 

For the estimation of Vh[S{0, s)], the main competitors of the AEP algorithm are proba- 
bly Monte Carlo and quasi-Monte Carlo methods. Given M points Xi, . . . ,xj\/ in 5(0, s), 
it is possible to approximate Vff[iS(0,s)] by the average of the density function vh eval- 
uated at those points, that is, 

r 1 

Vh[S{0,s)]= dH(x)^--^^,H(x.). (7.1) 

If the Xi's are chosen to be (pseudo-)randomly distributed, this is the Monte Carlo (MC) 
method. If the x^'s are chosen as elements of a low-discrepancy sequence, this is the 
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Table 11. Asymptotic convergence rates of the AEP, standard MC and QMC methods 



d 


2 


3 




4 


5 


AEP (upper bound) 




M- 


-1.5 




^-0.34 


MC 




M- 


-0.5 


^-0.5 




QMC (best) 




M- 


-1 






QMC (worst) 


M~'^{logMf 


M- 


'\log Mf 


M-^(logM)* 


M-i(logM)5 



quasi-Monte Carlo (QMC) method. A low- discrepancy sequence is a totally deterministic 
sequence of vectors that generates representative samples from a uniform distribution on 
a given set. With respect to Monte Carlo methods, the advantage of using quasi-random 
sequences is that points cannot cluster coincidentally on some region of the set. However, 
randomization of a low-discrepancy sequence often improves performance; see [12]. 

In recent years, various methods and algorithms have been developed in order to reduce 
the variance of MC and QMC estimators and to obtain probabilities of (rare) events 
with reasonable precision and effort. For details on the theory of rare event simulation 
within MC methods, we refer the reader to [2, 10, 13, 14]. For an introduction to quasi- 
Monte Carlo methods and recent improvements, we refer to, for instance, [18] and [12]. 
A comprehensive overview of both methods is given in [21]. 

Using central limit theorem arguments, it is possible to show that traditional MC, 
using (pseudo-) random numbers, has a convergence rate of 0(Af^^/^), independently of 
the number of dimensions d. QMC can be much faster than MC with errors approaching 
0(A/~^) in optimal cases (see [16]), but the worst theoretic rate of convergence decreases 
with the dimension d as 0((logM)'^M~^); see [18]. In applications to finance and insur- 
ance, it is more common to get results closer to the best rate of convergence if the 
density vh is smooth, that is, has a Lipschitz-continuous second derivative. In this case, 
it is possible to show that the convergence rate is at least 0((log Af)''A/~'^/^); see [6]. In 
Table 11, we compare convergence rates of MC and QMC methods with respect to the 
AEP rates (depending on d), as provided in Section 5. We thus expect a well-designed 
QMC algorithm to perform better, asymptotically, than AEP under a smooth probability 
model and for dimensions d > 4. Because of the computational issues for AEP in higher 
dimensions, we restrict our attention to d < 5 in Table 11. 

Don McLeish kindly adapted an algorithm using a randomized Korobov low- 
discrepancy sequence to the portfolio leading to Table 3. The parameters for the sequence 
are those recommended in [9]. The standard errors (s.e.'s) are obtained by independently 
randomizing ten (part (a) of the table) and fifty (part (b) of the table) sequences with 1 
million terms each, corresponding to AI = le— 07 (a) and M = 5e— 07 (b). The average 
CPU times are, of course, on a different machine (IBM Thinkpad 2.5 GHz Intel Core 2 
Dual, 4 GB RAM). In Table 12a, we provide the comparison between QMC and AEP 
extrapolated estimates. The results seem to be coherent with Table 11 above. For the 
same precision, AEP is much faster than QMC in the two-dimensional example and 
slightly slower for d = A. Recall that, in higher dimensions, programming a randomized 
Korobov rule is much more demanding than using AEP. 
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What is important to stress here is that in MC and randomized QMC methods similar 
to the one apphed in Table 12a, the final estimates contain a source of randomness. 
Contrary to this, the AEP algorithm is deterministic, being solely based on geometric 
properties of a certain domain. Moreover, the accuracy of MC and QMC methods is 
generally lost for problems in which the density vh is not smooth or cannot be given in 
closed form, and comes at the price of an adaptation of the sampling algorithm to the 
specific example under study. Recall that the AEP algorithm does not require the density 
of the distribution H in analytic form, nor does it have to assume overall smoothness. 
Finally, the precision of MC methods depends on the threshold s at which Vh[S{0, s)] is 
evaluated: estimates in the (far) tail of the distribution will be less accurate. 

The re-tailoring, from example to example, of the rule to be iterated is also common to 
other numerical techniques for the estimation of Vh[S{0, s)] such as quadrature methods; 
see [7] and [19] for a review. However, in the computation of multi-dimensional integrals, 
as in (7.1), numerical quadrature rules are typically less efficient than MC and QMC. 

When the random variables Xi,...,Xd are exchangeable and heavy-tailed, some 
asymptotic approximations of Vh[S{0, s)] for large s can be found in [3, 11] and refer- 
ences therein. It is important to remark that the behavior of AEP is not affected by the 
threshold s at which Vh[S{0, s)] is computed, nor by the tail properties of the marginal 
distributions ■ This is particularly interesting as, under heavy-tailedness, the relative 
error of MC and QMC methods increases in the tail of the distribution function of Sd- 

We are, of course, aware that a well-designed quadrature rule or a specific quasi- 
random sequence might perform better than AEP in a specific example, with respect to 
both accuracy and computational effort. However, AEP provides very accurate estimates 



Table 12a. AEP and QMC (using Korobov sequence) estimates for Vff[<S(0,s)] for the sum of 
Two Pareto distributions with parameters 6i — 0.9 and 62 = 1.8, coupled by a Clayton copula 
with parameter S — 1.2 



s 


AEP estimate (n = 14, 4.87 s) 


QMC estimate (M =le-07, 6.6 s) 


QMC 


s.e. 


10° 


0.315835041363413 


0.3158345 


+2.7e- 


-06 


10^ 


0.983690398912470 


0.98369106 


+ 1.0e 


-06 


10* 


0.999748719228038 


0.99974872 


+ 1.5e- 


-07 


10^ 


0.999996018907752 


0.999996 


+A.Oe- 


-08 



Table 12b. AEP and QMC (using Korobov sequence) estimates for Vh[S{0,s)] for the sum 
of four Pareto distributions with parameters Si = 0.9, ^2 = 1.8, 63 = 2.6, ^4 = 3.3, coupled by a 
Clayton copula with parameter 5 = 0.2; computational times are also provided 



s 


AEP estimate [n = 7, 107.70 s) 


QMC estimate (M =5e-07, 95 s) 


QMC 


s.e. 


10^ 


0.833826902853978 


0.83380176 


+3.6e- 


-06 


10^ 


0.983565803484355 


0.98362452 


+9.Qe- 


-07 


10^ 


0.997972831330699 


0.997997715 


+2.3e- 


-07 


10* 


0.999745113409911 


0.999748680 


+5.0e 


-08 
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of the distribution of sums up to five dimensions in a reasonable time without the need 
to adapt to the probabihstic model under study. AEP can handle, in a uniform way, 
any joint distribution H, possibly in the form of its copula and marginal distributions. 
Because of its ease of use and the very weak assumptions upon which it is based, AEP 
offers a competitive tool for the computation of the distribution function of a sum of up 
to five random variables. A Web-based, user-friendly version has been programmed and 
will eventually be made available. 

8. Final remarks 

In this paper, we have introduced the AEP algorithm in order to compute numerically 
the distribution function of the sum of d random variables Xi, . . . ,Xd with given joint 
distribution H. The algorithm is mainly based on two assumptions: the random vari- 
ables Xi are bounded from below and the distribution H has a bounded density in a 
neighborhood of the curve defined in (3.7). Under this last assumption, the sum Sd 
has to be continuous at the threshold s where the distribution is calculated, that is, 
^[Sd = s] ~ 0. When, instead, [F^] > 0, the algorithm may fail to converge. As an ex- 
ample, take two random variables Xi and X2 with P[A:i = 1/2] = ¥[X2 = 1/2] = 1. Then, 
Vf/[iS(0, 1)] = 1, but the sequence fn(l) alternates between and 1. Similar examples 
for arbitrary dimension d can easily be constructed. 

If H has at least a bounded density near Fg. then the convergence of the sequence P„(s) 
to the value V/f[5(0,s)] is guaranteed. As already remarked, the speed of convergence 
may vary, depending on the probability mass of a neighborhood of Tg- Tools to increase 
the efficiency of the algorithm arc therefore much needed in these latter cases. 

The AEP algorithm has been shown to converge when d < 5 if the joint distribution 
H of the vector {Xi, . . . ,Xd) has a bounded density vh- Under some extra smoothness 
assumptions on vh, convergence holds when d < 8. All of these conditions can be weak- 
ened to hold only in a neighborhood of the curve Fs and arc satisfied by most examples 
which are relevant in practice. 

We were not able to prove convergence of AEP in arbitrary dimensions, although we 
conjecture this to hold. The main problem in higher dimensions is the non-monotonicity 
of Pn{s) and P*i{s). This results from the fact that the s^'s, as defined in (3.4), may be 
positive as well as negative. From a geometric point of view, the main problem is the fact 
that the simplexcs iS^+i, fc = 1, . . . , N", passed to the (n + l)th iteration of the algorithm, 
arc generally not disjoint for d> 2. As illustrated in Table 1, the sum of the Lebesgue 
measures of the iS^_|_]^'s is increasing in the number n of iterations when d> 6, while their 
union always lies in some neighborhood of the curve F^ . A general convergence theorem 
may need a volume decomposition different from (3.1) and using only a family of disjoint 
simplexes, or else an extension of the extrapolation technique. 

We also remark that the statement of a general convergence theorem will not entail 
any practical improvement of AEP, since memory constraints limit the use of the algo- 
rithm to dimension d < 5. However, in these manageable dimensions, we expect the AEP 
convergence rates to be better than their upper bounds given in Table 2. 
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Apart from the study of convergence of AEP in higher dimensions, in future research, 
we will also address an extension of the algorithm to more general aggregating functions 
ipi^i, . . . ,X(i) and the study of an adaptive (i.e., depending on H) and more efficient (in 
terms of new simplcxcs produced at each iteration) decomposition of the simplexes. 

Appendix: Proof of (3.1) 

Recall that, in Section 1, wc denoted by io, . • . , i-N all of the 2^ vectors in {0, 1}'', with 
io = (0, . . . , 0), ifc = efc, fc = 1, . . . , d, and ijv = 1 = (1, . . . , 1), where N = 2'^ — l. Also, recall 
that #i denotes the number of I's in the vector i, for instance, #io = 0, #iAr = d. 

Theorem A.l. For any heW\ heM. and a e [1/d, 1), we have that 

N 

Vh [S{h, h)] = Vumh, ah)] + m= Vh [S{h^ , h^ )] , 

i=i 

where, for all j = 1, . . . , N , 

b^' = b + ahij, W = (1 - #ija)/i. 



0, if #i, = II a, 



(A.l) 



Note that (A.l) is equivalent to (3.1) under the notation introduced in Section 3. In 
order to prove the above theorem, we need some lemmas. In the following, 5ij denotes 
the Kronecker delta, that is. 



5^J 



0, ifz^j, 

1, ifi = j. 



Lemma A. 2. Fix i,j E D with i ^ j . Then, for any /i, s G R with hs>Q and b e M.^, we 
have that 

S{h + he,,s) n S{h + he„s) ^H^^^ ^""^ + ' % 'J' 5 \'\' 

Proof. 

Proof of C. First, assume < s < /i. By definition (1.4), for a vector x G iS(b + /le^, s), 
we have that 

d 

Xk>bk+5ikh,keD and y^(xfc - bfc - hkh) < s, 

k=l 
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from which it foUows that 

Xj <bj + s ~ ^^{xk - bk - Sikh) < bj + s< b.j + h, 

that is, X ^ S{h + hej,s). Now, assume that < h < s. For a vector x S S{h + hei, s) n 
S{h + hej,s), we have that 

Xk — bk > 0,k ^ D with Xi> bi + h and Xj > bj + h. (A. 2) 

Again, x G S{h + hsi, s), therefore X]fc=i(^fc ^ i^k + hSik)) < s. Subtracting h from both 
sides of the last inequahty, we obtain 

d 

Yl^^k-{hk + h5^k + h5jk))<s-h. (A.3) 
fc=i 

Equations (A. 2) and (A.3) show that x G S{h + hej + hei, s — h). The case h,s <Q is 
analogous. 

Proof of D. If < s < ft., there is nothing to show. Suppose, then, that < h < s. For 
any fixed x e S{h + hej + hei,s — h), (A.3) holds with Xk — {bk + hSik + hSjk) >0, k G D. 
By adding hSjk in the sum on the left-hand side and h to the right-hand side of (A.3), 
we find that 

d 

^{xk-ibk + h6,k))<s. (A.4) 

Since {xk — {bk + hSik)) is still positive for all k € D, (A.4) shows that x G S{h + hei, s). 
By similar reasoning, we also have that x G tS(b -I- hej,s). The case /i, s < is analogous; 
the case /is = is trivial. □ 

Lemma A.3. For any b G M'', /i G M and a G (0, 1), we have that 

d 

S{h, h) \ Q(b, ah) = (J S{h + ahok^h — ah). 

k=l 

Proof. 

Proof of C. First, assume that ft > 0. If x G iS(b, ft) \ Q(b, ah), then Xk > bk, k E D and 
{^k — bk) < ft, while, by definition (1.2), there exists a j G D such that Xj — bj > ah. 
For this j, it is then possible to write 

d 

"^{xk - {bk + Sjkah)) <h~ ah with Xk - {bk + Sjkah) > 0,k e D, (A. 5) 

k=l 

which yields x G S{h + ahej, h ~ ah) C ljfc=i '^(^ + (^hek, ft — ah). 



The AEP algorithm 



589 



Proof of D. Let x e Ufc=i S{h + ahek, h — ah), meaning that there exists j € D for which 
X satisfies (A. 5). It follows that xj > bj + ah (hence x ^ Q(b, ah)) and J2k=i i^k — bk) < 
h — ah + ah = h. Noting that (A. 5) also implies Xk > bk, k G D, we finally obtain that 
X G S{h, h) \ Q(b, ah). The case ft. < is analogous, while the case ft, = is trivial. □ 

Lemma A. 4. For any b G W^, ft G R and a G [l/d, 1) , we have that 

Q(b, aft) \ S{h, ft) = S{h + aftl, ft - adh) n Q(b, ah). 

Proof. 

Proof of C. If a = 1/d, then the lemma is straightforward. So, choose a G 1) and 
assume ft > 0. If x G Q(b, ah) \ S{h, ft), then Xk > bk for all k E D. Since x ^ iS(b, ft), it 
follows that '^i^i{xi — bi) > ft. Since Xk < bk + ah for all k E D, we can write 

d 

^(zfc -bk~ ah) > ft - adh with Zfc - {bk + aft) < for all keD. (A. 6) 

fc=i 

As ft — daft = ft(l — da) < 0, we conclude that x G S{h + aftl, ft — adh) and, hence, by 
assumption, x G S{h + aftl, ft — adft) fl Q(b, aft). □ 

Proof of D. Let x G iS(b + aftl, ft — adft) n Q(b,aft). Due to ft — adh < 0, it follows 
that (A. 6) holds, implying that X]fc=i(^fc — bk) > ft, that is, x ^ 5(b,ft). The case ft < 
is analogous, while the case ft = is trivial. □ 



We are now ready to prove the main result in this appendix. 



Proof of Theorem A.l. The case ft = is trivial. Suppose, then, that ft 7^ 0. From the 
general property of two sets A, B that B ^ {AU {B\A))\{A\ B), {A\ B) C AU {B \A) 
and Ar\(B\A) = 0, it follows that 

Vh [5(b, ft)] = Vh [Q(b, aft)] + [5(b, ft) \ Q(b, aft)] - Vh [Q(b, aft) \ 5(b, ft)] . (A.7) 

Using the notation 5'"' = S{h + ahek, ft — aft). Lemma A. 3 implies, for the second sum- 
mand in (A.7), that 



VH[S{h,h)\Q{h,ah)]=VH 



.k=l 



k=l 



ICD,\I\=k Sg/ 

Fixing I d D with / = {rii, . . . , n^-}, itcratively using Lemma A. 2 yields 



(A.8) 



S{h + ahe„i , ft — aft) 



iG/ 



_ j l^b + aft e„^. , ft(l — fca)J , iffca<l, 
!, iffca>l. 
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Substituting this last expression into (A. 8) implies that 

Vff[cS(b,/i)\Q(b,a/i)] = ^ VH[Sih + ahir,h{l-ka))] 

ken, i^G{o,i}'', 

ka<l #i,.=fc 

(A.9) 

= E (-l)*'+'^H['5(b + aM,/i(l-#ia))]. 
ie{o,i}'', 

0<#i<l/Q 

Using Lemma A. 4 for the third summand in (A. 7), we can also write that 

VH[Q{h,ah)\S{h,h)] 

= VH[S{h + ahl, h - adh) D Q(b, ah)] (A.IO) 

= Vh [S{h + ahl, h - adh)] - Vh [S{h + ahl, h - adh) \ Q{h, ah)] . 

Note that if a = 1/d, then the quantity in (A.IO) is zero. We can hence assume that 
a ^ 1/d. Observing that Q{h,ah) = Q{h + ahl, —ah) and defining b = b + ahl, a = 
—a/ (1 — ad) > 1/d and h — h{l — ad), we can write 

Vh [S{h + ahl,h- adh) \ Q{h, ah)] = Vh [S{h, h) \ Q{h, ah)] . 

Note that the right-hand side of the previous equation is empty if ci > 1, that is, a S 
{l/d,l/{d— 1)]. At this point, equation (A.9) yields 

VH[S{h + ahl, h - adh) \ Q(b, ah)] 

= E (-l)*'+'^H['5(b + aM,/i(l-#id))] 
ie{o,i}'', 

0<#i<l/a 

= E {-l)*'^'VH[S{h + ah{l-i),h{l-a{d~#i)))]- 

i£{0.1}'', 
0<#i<d-l/Q 

Substituting i = 1 — i (#i — d — #i) into the previous equation, we can equivalently 
write 

Vh [S{h + ahl,h- adh) \ Q(b, ah)] 

(A.ll) 

- E {-^f'*'^'VH[S{h + ahi,h{l-#la))]. 

l/a<#i<d 

In keeping with what was noted above, this last equation is null in the aforementioned 
case in which ce>l. Recalling (A.IO) and noting that 

iS(b + ahl, h — adh) — S{h + ahi^, h{l — ^'iNaj), 
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we obtain 

VH[Q{h,h)\Sih,ah)] 

= VH[S{h + ahiN, h{l - #iAra))] 

(A.12) 

- i-^f'*'^'VH[Sih + ahlh{l-#ia))] 

ie{o,i}'i, 

l/a<#i<d 

= E {-l)''-*'VH[S{h + ahi,h{l-#la))]. 

1/Q<#i<rf 

Finally, recalling the definitions in (A.l), we substitute equations (A. 9) and (A.12) 
into (A. 7) to obtain 

VH[S{h,h)]^VH[Q{h,ah)]+ (-l)*'+''^ff[5(b + aM,Ml-#ia))] 

ie{o,i}'', 

0<#i<l/a 

- E {-'^y'*'VH[S{h + ahih{l-#\a))] 
ie{o.i}'^, 

1/Q<#i<d 

N 

= Vh [Q(b, ah)] + y m^Vn [S{h^ , W )] . 

u □ 
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